Goto

Collaborating Authors

 conditional coverage


Conformalized Percentile Interval: Finite Sample Validity and Improved Conditional Performance

arXiv.org Machine Learning

Conformal prediction provides distribution-free predictive intervals with finite-sample marginal coverage. However, achieving conditional validity and interval efficiency (in terms of short interval length) remains challenging, particularly in complex settings with heteroskedasticity, skewed responses, or estimation errors. We propose a conformal-style calibration method for responses obtained by the probability integral transform (PIT) of the conditional cumulative distribution function (CDF) estimated via neural networks to construct a finite-sample-adjusted percentile interval with the shortest length determined by the estimated conditional CDF. Calibrating in PIT space is effective because PIT values are asymptotically feature-independent when the CDF estimator is accurate, which mitigates feature-dependent miscoverage and improves conditional calibration. On the other hand, our percentile calibration adapts to the empirical PIT distribution, which is robust against a possibly imperfect estimation of the conditional CDF. We prove the finite-sample marginal coverage property of the proposed method and show its asymptotic conditional coverage under mild consistency conditions. Experiments on diverse synthetic and real-world benchmarks demonstrate better conditional calibration and substantially shorter intervals than existing methods.


Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

arXiv.org Machine Learning

Online Conformal Prediction (CP) struggles to balance temporal adaptability and structural stability. Feedback-driven methods (e.g., Adaptive Conformal Inference (ACI)) suffer from systemic marginal under-coverage and high interval variance during abrupt shifts, while temporally discounted Bayesian CP suffers from severe structural lag and uncalibrated interval bloat. We propose State-Adaptive Bayesian Conformal Prediction (SA-BCP) to achieve optimal spatio-temporal decoupling. By gating long-term temporal inertia with spatial kernel-density evidence, SA-BCP proactively expands intervals for recognized historical regimes while maintaining tight efficiency during stable states. We rigorously prove this mechanism's optimality, identifying a minimax bias-variance tradeoff governed by an evidence threshold $K$. Extensive benchmarks on volatile financial datasets (2016--2026), including AMD, Gold, and GBP/USD, demonstrate that SA-BCP consistently minimizes the strictly proper Winkler score across diverse confidence levels. Specifically, SA-BCP resolves the systematic under-coverage inherent to ACI variants while simultaneously reducing the uncalibrated interval bloat of Bayesian CP by 10\% to 37\% under high-confidence requests. By elegantly navigating this tradeoff, SA-BCP achieves an optimal balance between conditional reliability and predictive efficiency.


Conformal Prediction using Conditional Histograms

Neural Information Processing Systems

This paper develops a conformal method to compute prediction intervals for nonparametric regression that can automatically adapt to skewed data. Leveraging black-box machine learning algorithms to estimate the conditional distribution of the outcome using histograms, it translates their output into the shortest prediction intervals with approximate conditional coverage. The resulting prediction intervals provably have marginal coverage in finite samples, while asymptotically achieving conditional coverage and optimal length if the black-box model is consistent. Numerical experiments with simulated and real data demonstrate improved performance compared to state-of-the-art alternatives, including conformalized quantile regression and other distributional conformal prediction approaches.


Locally Valid and Discriminative Prediction Intervals for Deep Learning Models

Neural Information Processing Systems

Crucial for building trust in deep learning models for critical real-world applications is efficient and theoretically sound uncertainty quantification, a task that continues to be challenging. Useful uncertainty information is expected to have two key properties: It should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high). Moreover, when combined with deep learning (DL) methods, it should be scalable and affect the DL model performance minimally. Most existing Bayesian methods lack frequentist coverage guarantees and usually affect model performance. The few available frequentist methods are rarely discriminative and/or violate coverage guarantees due to unrealistic assumptions. Moreover, many methods are expensive or require substantial modifications to the base neural network. Building upon recent advances in conformal prediction [13, 33] and leveraging the classical idea of kernel regression, we propose Locally Valid and Discriminative prediction intervals (LVD), a simple, efficient and lightweight method to construct discriminative prediction intervals (PIs) for almost any DL model. With no assumptions on the data distribution, such PIs also offer finite-sample local coverage guarantees (contrasted to the simpler marginal coverage). We empirically verify, using diverse datasets, that besides being the only locally valid method for DL, LVD also exceeds or matches the performance (including coverage rate and prediction accuracy) of existing uncertainty quantification methods, while offering additional benefits in scalability and flexibility.



Conformal Prediction with Time-Series Data via Sequential Conformalized Density Regions

arXiv.org Machine Learning

We propose a new conformal prediction method for time-series data with a guaranteed asymptotic conditional coverage rate, Sequential Conformalized Density Regions (SCDR), which is flexible enough to produce both prediction intervals and disconnected prediction sets, signifying the emergence of bifurcations. Our approach uses existing estimated conditional highest density predictive regions to form initial predictive regions. We then use a quantile random forest conformal adjustment to provide guaranteed coverage while adaptively changing to take the non-exchangeable nature of time-series data into account. We show that the proposed method achieves the guaranteed coverage rate asymptotically under certain regularity conditions. In particular, the method is doubly robust -- it works if the predictive density model is correctly specified and/or if the scores follow a nonlinear autoregressive model with the correct order specified. Simulations reveal that the proposed method outperforms existing methods in terms of empirical coverage rates and set sizes. We illustrate the method using two real datasets, the Old Faithful geyser dataset and the Australian electricity usage dataset. Prediction sets formed using SCDR for the geyser eruption durations include both single intervals and unions of two intervals, whereas existing methods produce wider, less informative, single-interval prediction sets.


Conformal Prediction Assessment: A Framework for Conditional Coverage Evaluation and Selection

arXiv.org Machine Learning

Conformal prediction provides rigorous distribution-free finite-sample guarantees for marginal coverage under the assumption of exchangeability, but may exhibit systematic undercoverage or overcoverage for specific subpopulations. Assessing conditional validity is challenging, as standard stratification methods suffer from the curse of dimensionality. We propose Conformal Prediction Assessment (CPA), a framework that reframes the evaluation of conditional coverage as a supervised learning task by training a reliability estimator that predicts instance-level coverage probabilities. Building on this estimator, we introduce the Conditional Validity Index (CVI), which decomposes reliability into safety (undercoverage risk) and efficiency (overcoverage cost). We establish convergence rates for the reliability estimator and prove the consistency of CVI-based model selection. Extensive experiments on synthetic and real-world datasets demonstrate that CPA effectively diagnoses local failure modes and that CC-Select, our CVI-based model selection algorithm, consistently identifies predictors with superior conditional coverage performance.


Generative Score Inference for Multimodal Data

arXiv.org Machine Learning

Accurate uncertainty quantification is crucial for making reliable decisions in various supervised learning scenarios, particularly when dealing with complex, multimodal data such as images and text. Current approaches often face notable limitations, including rigid assumptions and limited generalizability, constraining their effectiveness across diverse supervised learning tasks. To overcome these limitations, we introduce Generative Score Inference (GSI), a flexible inference framework capable of constructing statistically valid and informative prediction and confidence sets across a wide range of multimodal learning problems. GSI utilizes synthetic samples generated by deep generative models to approximate conditional score distributions, facilitating precise uncertainty quantification without imposing restrictive assumptions about the data or tasks. We empirically validate GSI's capabilities through two representative scenarios: hallucination detection in large language models and uncertainty estimation in image captioning. Our method achieves state-of-the-art performance in hallucination detection and robust predictive uncertainty in image captioning, and its performance is positively influenced by the quality of the underlying generative model. These findings underscore the potential of GSI as a versatile inference framework, significantly enhancing uncertainty quantification and trustworthiness in multimodal learning.


Shape-Adaptive Conditional Calibration for Conformal Prediction via Minimax Optimization

arXiv.org Machine Learning

Achieving valid conditional coverage in conformal prediction is challenging due to the theoretical difficulty of satisfying pointwise constraints in finite samples. Building upon the characterization of conditional coverage through marginal moment restrictions, we introduce Minimax Optimization Predictive Inference (MOPI), a framework that generalizes prior work by optimizing over a flexible class of set-valued mappings during the calibration phase, rather than simply calibrating a fixed sublevel set. This minimax formulation effectively circumvents the structural constraints of predefined score functions, achieving superior shape adaptivity while maintaining a principled connection to the minimization of mean squared coverage error. Theoretically, we provide non-asymptotic oracle inequalities and show that the convergence rate of the coverage error attains the optimal order under regular conditions. The MOPI also enables valid inference conditional on sensitive attributes that are available during calibration but unobserved at test time. Empirical results on complex, non-standard conditional distributions demonstrate that MOPI produces more efficient prediction sets than existing baselines.


LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees

arXiv.org Machine Learning

Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.